Protein classification using modified n-grams and skip-grams
نویسندگان
چکیده
منابع مشابه
Protein classification using modified n-grams and skip-grams.
Motivation Classification by supervised machine learning greatly facilitates the annotation of protein characteristics from their primary sequence. However, the feature generation step in this process requires detailed knowledge of attributes used to classify the proteins. Lack of this knowledge risks the selection of irrelevant features, resulting in a faulty model. In this study, we introduce...
متن کاملComparing Medline citations using modified N-grams
OBJECTIVE We aim to identify duplicate pairs of Medline citations, particularly when the documents are not identical but contain similar information. MATERIALS AND METHODS Duplicate pairs of citations are identified by comparing word n-grams in pairs of documents. N-grams are modified using two approaches which take account of the fact that the document may have been altered. These are: (1) d...
متن کاملModeling Harmony with Skip-Grams
String-based (or viewpoint) models of tonal harmony often struggle with data sparsity in pattern discovery and prediction tasks, particularly when modeling composite events like triads and seventh chords, since the number of distinct n-note combinations in polyphonic textures is potentially enormous. To address this problem, this study examines the efficacy of skip-grams in music research, an a...
متن کاملSkip N-grams and Ranking Functions for Predicting Script Events
In this paper, we extend current state-of-theart research on unsupervised acquisition of scripts, that is, stereotypical and frequently observed sequences of events. We design, evaluate and compare different methods for constructing models for script event prediction: given a partial chain of events in a script, predict other events that are likely to belong to the script. Our work aims to answ...
متن کاملBiological Named Entity Recognition Using n-grams and Classification Methods
We propose a biological named entity recognition system which uses classification methods and a n-gram model to annotate terms in text. A novel method is presented to express lexical features in a pattern notation. Prefix and suffix characters are used instead of lists of potential terms or other external resources. Creating classification exemplars is conducted from text by using a word n-gram...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Bioinformatics
سال: 2017
ISSN: 1367-4803,1460-2059
DOI: 10.1093/bioinformatics/btx823